Hashes are not suitable to verify fixity of the public archived web

Web archives, such as the Internet Archive, preserve the web and allow access to prior states of web pages. We implicitly trust their versions of archived pages, but as their role moves from preserving curios of the past to facilitating present day adjudication, we are concerned with verifying the fixity of archived web pages, or mementos, to ensure they have always remained unaltered. A widely used technique in digital preservation to verify the fixity of an archived resource is to periodically compute a cryptographic hash value on a resource and then compare it with a previous hash value. If the hash values generated on the same resource are identical, then the fixity of the resource is verified. We tested this process by conducting a study on 16,627 mementos from 17 public web archives. We replayed and downloaded the mementos 39 times using a headless browser over a period of 442 days and generated a hash for each memento after each download, resulting in 39 hashes per memento. The hash is calculated by including not only the content of the base HTML of a memento but also all embedded resources, such as images and style sheets. We expected to always observe the same hash for a memento regardless of the number of downloads. However, our results indicate that 88.45% of mementos produce more than one unique hash value, and about 16% (or one in six) of those mementos always produce different hash values. We identify and quantify the types of changes that cause the same memento to produce different hashes. These results point to the need for defining an archive-aware hashing function, as conventional hashing functions are not suitable for replayed archived web pages.


Introduction
Web archives, such as the Internet Archive (web.archive.org), UK Web Archive (webarchive. org.uk/ukwa/), and many others [1] have been established with the goal to preserve the web and allow access to prior states of web pages. One of the main reasons for archiving live web pages is that they often disappear or change over time due to the ephemeral nature of the web. If a web page disappears (e.g., returning "404 Not Found" from the live web) before it is captured by a web archive, then we will not be able to recover the missing web page or have any evidence of its existence in the past. An archived web page, or memento, is a resource on the web "that encapsulates a prior state" of a live web page [2], and is thus fixed at the date and time it was archived. However, we currently do not have any mechanism that allows us to verify the fixity of mementos. For example, if a web page was archived in 1999 and is replayed in 2022, how do we know that the archived version still encapsulates the state of the page as observed in 1999 and has not been modified or tampered with during those 23 years? An example that highlights the importance of verifying the fixity of mementos is the story of Joy-Ann Reid, an American cable television host at MSNBC. In December 2017, she apologized for writing several "insensitive" LGBT blog posts nearly a decade earlier when she was a morning radio talk show host in Florida [3,4]. In April 2018, Reid, supported by her lawyers, claimed that her blog and/or the archived versions of the blog in the Internet Archive had been compromised and the content was fabricated [5]. Even though the Internet Archive denied that their archived pages had been hacked [6], a stronger case could be made if we had an independent service verifying that those archived blog posts had not changed in the archive.
In the context of web archiving, fixity refers to the status of mementos being always fixed and unaltered. The TRAC report [7] indicates that verifying the fixity of archived content is needed to establish trust in digital repositories and archives. It is important for users of web archives, whether human or robot, to have the ability to verify the fixity of archived web pages for several reasons: 1. The number of public and private web archives is increasing [1,8], and we may not have the same level of trust in all of these archives.
3. There are different security threats against web archives [18][19][20][21][22][23] that not only affect accessibility to archived collections but also would change the representation of replayed archived pages over time. These concerns map to the Availability and Integrity components of the CIA security triad [24], respectively. Our main focus in this work is to aid in efforts to detect when Integrity is compromised.
A common technique for verifying fixity on digital resources is to generate a unique string, or a hash value, that represents the content of the resource at a particular time using cryptographic hashing algorithms, such as MD5 or SHA-256. The resulting hash values cannot be converted back to the original content. By periodically generating a new hash value on the content of a resource and comparing it with a previously computed and known to be correct hash, we can detect if the resource has been corrupted or altered.
Our goal in this work is to investigate the feasibility of computing fixity on replayed mementos rather than on the individual WARC files, which contain the original captured data and are located at a web archive. This concept differs from conventional approaches to utilize fixity information of digital files as frequently seen in the digital repository environment. For example, a repository such as GitHub that hosts a digital file also creates its content-based hash value and makes both available for download via the web (Fig 1). If a user is interested in verifying the fixity of the file, they can download it along with its corresponding hash value from GitHub. The user can then apply the same cryptographic hashing algorithm as GitHub to create a hash value of their own. If both hash values match, the user can be confident that the downloaded copy of the file is identical to the copy hosted by GitHub and has not been altered during the download process. This verification step is repeatable and, as long as the file is not altered, will always result in the same outcome.
The picture is different for web archives since neither the raw WARC files nor their corresponding hash values are generally available to the public for download. In order to access the content of a memento, we therefore need to dereference its URI. This implies the involvement of archive-specific replay software that aims to present the archived web sources to the user at is was at the time it was captured. So rather than a simple file transfer via HTTP as in the GitHub scenario above, the replay software acts as an "in-between webserver" that extracts the memento from the WARC file and translates it into a resource that a web browser can display (Fig 2). In this scenario we are left with creating a hash value based on the content of memento as presented to us in a browser via the replay engine. In theory, the same immutable memento served via the same replay engine from the same web archive should result in the same content-based hash value, which would give us at least some sense of fixity. This process would allow users of web archives and independent third parties to verify that replayed archived resources have remained unchanged since their time of capture. Allowing verification by outside parties is important because if the contents of an archive are compromised, then the archive's fixity information may also be compromised.
Some web archives make "raw" (i.e., unmodified) mementos available, which could be used to compute fixity. However, this is not a reliable proposition because: 1) not all archives support raw memento access, 2) since typical users do not interact with raw mementos, it would not represent a verification of a user's experience, and 3) as we detail below, even raw mementos are sometimes transformed by web archives.
Our core finding presented in this paper is that generating repeatable hash values on replayed mementos is much more complicated, if not impossible, given the current landscape of replay software. We categorize and detail the reasons in the below: • Inclusion of Embedded Resources: A memento generally consists of multiple resources embedded in the page. For example, the memento https://web.archive.org/web/ 20170717184643/https://climate.nasa.gov/vital-signs/carbon-dioxide/ consists of 201 images, 19 JavaScript files, 3 CSS files, and the base HTML file; if any of those embedded resources changes, one would say "the page has changed." We refer to such a memento as a composite memento, representing the set of all resources comprising an archived page [25]. In this paper, we are interested in computing the fixity on a composite memento by generating one aggregated hash that represents the composite memento (cf. generating the hash value on only the base HTML file). This is challenging because any small change in the replay of the composite memento over time affecting either the base HTML file or any embedded resource will result in a different hash value.
• Client-side Execution: JavaScript, which runs on the client, can modify the Document Object Model (DOM) [26], dynamically adding or deleting HTML elements. JavaScript can also load more resources on a web page at any time after its initial loading. Even though two mementos may have the same JavaScript source code, the execution of the JavaScript may produce different results (e.g., based on random numbers or browser-specific information), and thus the pages may appear differently when rendered. Our prior work [27] found that about 54.5% of web pages in 2012 included JavaScript to load embedded resources, and the 2022 Web Almanac reports a median of 22 JavaScript files per desktop web page [28] and a median of 509 KB of JavaScript loaded for each desktop web page [29]. Thus, modern web pages contain a lot of JavaScript, and executing JavaScript code impacts generating repeatable hash values on composite mementos. 09:52:15 GMT). At replay time, all URLs are rewritten relative to the base HTML (in the above example, 20180618094908) and when dereferenced, the archive redirects to the temporally closest available memento. This delta in Memento-Datetime can range from zero seconds to many years, and when resources are shared between multiple pages, such as style sheets and images (both of which typically change more slowly than text resources), large deltas are common as a result of the archives optimizing their crawl strategies. (Large Memento-Datetime deltas themselves are not the problem, for example a GIF that never changed after it was published would remain valid even if the delta was measured in years, but a dynamically updated GIF could be invalid even if the delta is measured in seconds [25,32].) The holdings of a web archive are dynamic, and new resources, with varying values of Memento-Datetime, are accessioned all the time, and some resources are even occasionally lost, which means that the deltas can vary over time. In summary, the web archive policy of rewriting embedded resources relative to the Memento-Datetime of the base HTML file and then doing a best effort resolution at the time of replay can result in changes in the composite memento even when the base HTML has not changed.
• Inclusion of Archive-specific Resources: Archives may add their own archival banners in the base HTML of a memento to provide metadata about both the memento being viewed and the original page. Including such archive-specific content (e.g., archival banners), which often contains dynamic information reflecting the current state of the archive, in the hash calculation affects generating repeatable hashes.
In this paper, which is based on Aturban's PhD thesis [33], we wanted to test the standard digital preservation technique of using hashes for verifying fixity. We selected 16,627 mementos from 17 public web archives. Over a period of 14 months, we downloaded each memento 39 times and computed their hash values on each replay (for a total of 648,453 hash calculations). For the reasons outlined above, we expected there would be challenges and that not all mementos would produce stable hash values through time. However, we were surprised at the extent of the changes we discovered: the fixity of the replayed pages changed often enough to effectively negate the utility of conventional hashing techniques for third-party auditing of archived web pages.
Through the downloaded content of the mementos and their resulting hash values, we provide answers to the following questions: • What are the types of changes in the playback of composite mementos that prevent or affect generating repeatable hashes? We compare the resulting 39 consecutive hash values for each composite memento. Each time two consecutive hashes are not identical, we look at the downloaded content to find out which resource (or resources) is causing different hash values.
• How many times does each identified type of change occur? This will help us in the future in defining requirements for generating repeatable hash values.
• For each composite memento, will excluding certain embedded resources that we expect to change over time help to generate repeatable hash values? Would, for example, excluding resources added by the archive produce more consistent hash values?

Background and related work
In this section we briefly explain how live web pages are rendered, and how archives crawl the web and replay mementos. Then, we review related work in trusted timestamping of digital resources, including trusty URIs. Next, we define fixity in the context of digital preservation and why it is important to verify fixity of mementos. Finally, we describe several security issues in web archives with relevance to our work. Although we cover related work in verifying fixity on digital resources, we emphasize that there are no existing solutions to computing or verifying fixity on composite mementos.

Rendering live web pages
HyperText Markup Language (HTML) is the standard markup language that specifies how web pages are structured using a set of defined tags and tag attributes. Web pages written in HTML are accessed from servers via the HTTP protocol. A web browser sends an HTTP request to the server. HTTP request headers are used to support HTTP content negotiation so that the client (e.g., the browser) can indicate which content is preferred, such as a document in a specific language or format. The server handles the HTTP request and sends back an HTTP response to the browser. The returned content includes HTTP response headers and the requested HTML file. Upon receiving the response, the browser renders the returned HTML. In order for the browser to display the entire web page, it will request other resources, such as images, style sheets, and JavaScript files, specified in the HTML by tags like <img>, <link>, and <script>. The number of resources embedded in a web page varies from zero to even hundreds. For example, the front page of CNN (https://www.cnn.com) as of March 2023 contained almost 200 embedded resources. The 2022 Web Almanac reports that the median number of requests to construct a desktop web page is 76 [29]. Web resources comprising a web page might be served from the same server hosting the base HTML file or from any other web server. In general, in response to an HTTP request, a server returns an HTTP response that should consist of the following: 1. The HTTP status code: The status code indicates whether the HTTP request has been successfully handled by the server. For example, the status code 404 Not Found indicates that the requested resource could not be found in the server while 200 OK indicates that the server completed the request.
2. The HTTP response entity headers: The headers contain information about the payload (e.g., Content-Length and Content-Type), the client/server connection (e.g., Keep-Alive and Connection), or the server (e.g., server).
3. The HTTP response entity body: The body is the response payload (i.e., the content of the requested resource). An HTTP response may not contain an entity body, for instance for responses with the HTTP status code 304 Not Modified.

Web archiving
Web archiving is the process of preserving portions of the current web for future generations. Web archiving is not only concerned with collecting and preserving web pages but also with how to provide access to those archived resources. Web archives hold billions of archived web pages [34]. For instance, we expect to find archived copies of a well-known web page (e.g., www.cnn.com) in large web archives, such as the Internet Archive, as these archives try to capture the entire web by employing large-scale web crawlers. Other web archives focus on preserving special collections. For instance, the UK Web Archive was established with the objective of archiving only UK websites (e.g., www.parliament.uk) [35]. Other web archives, such as perma.cc, webcitation.org, and archive.is, capture web pages on demand, so they only preserve pages submitted by users, not through crawling the web. Gomes et al. [36] developed a survey of various web archiving initiatives as of 2011, and the authors created a Wikipedia page [37] that has been used to keep the information updated. Table 1 shows a list of the 17 public web archives used in our study.

Memento protocol
Memento [2,38] is an HTTP protocol extension that uses time as a dimension to access the web by relating current web resources to their prior states. The Memento protocol is supported by most public web archives including the Internet Archive. The protocol introduces two HTTP headers for content negotiation. First, Accept-Datetime is an HTTP Request header through which a client can request a prior state of a web resource by providing the preferred datetime (e.g., Accept-Datetime: Mon, 09 Jan 2017 11:21:57 GMT). Second, the Memento-Datetime HTTP Response header is sent by a server to indicate the datetime at which the resource was captured. The Memento protocol also defines the following terms: • URI-R-an original resource from the live Web • URI-M-an archived version (memento) of the original resource at a particular point in time • URI-T-a resource (TimeMap) that provides a list of mementos (URI-Ms) for a particular original resource A Memento aggregator can be used to retrieve TimeMaps aggregated from multiple web archives. The Memento Aggregator from Los Alamos National Laboratory (LANL) [39,40] is one implementation that provides TimeMaps aggregated from different web archives both with (a) native support of the Memento protocol and (b) by proxy support of the Memento protocol. MemGator [41,42] is an open-source implementation that provides a variety of customization options, such as allowing users to specify a list of web archives to retrieve Time-Maps from, but it only aggregates TimeMaps from archives that natively support the Memento protocol.

Crawling and replaying archived web pages
A web crawler is an automated program that is used by web search engines (e.g., Google and Bing) and web archives to systematically collect and discover web pages (URIs) that exist on the web. The main purpose of crawling the web by search engines (e.g., via Google Googlebot [43]) is to index web pages and understand what the content of each page is about to be able to respond to users with the most relevant web pages to their queries. Web archives use web crawlers, such as the Internet Archive's Heritrix [44], to collect and preserve web pages, and allow access to those archived pages. In general, an archive's web crawler performs the following (simplified) steps to crawl live web pages: 1. Insert a given set of URIs (i.e., seed URIs) in a queue.
2. Select (or dequeue) one URI from the queue.
3. Dereference the web page identified by the selected URI.
4. Write the downloaded content of the web page to a file. The most common file format used by web archives is Web ARChive (WARC) [45], which specifies a set of rules for aggregating multiple web resources (e.g., HTML files, images, and style sheets) with the HTTP request/response entity and headers of each resource in addition to WARC-related metadata into a single file. WACZ [46], a recent extension of WARC that aggregates indexes with the associated WARC contents, is similar in purpose.

Extract any new URIs that
have not yet been crawled or placed in the queue from the downloaded content of the page.
6. Insert the newly discovered URIs in the queue.
7. If the queue is not empty, go to step 2.
The crawling process will result in a set of archived pages. To provide access to their archived pages, many web archives use OpenWayback [47], the open-source implementation of IA's Wayback Machine, to allow users to look up archived pages by submitting URIs. On the replay of an archived page, one of the main tasks of OpenWayback is to ensure that all resources comprising the page (e.g., images, style sheets, and JavaScript files) are retrieved from the archive, not from the live web. Thus, at the time of replaying the page, OpenWayback rewrites all links to those embedded resources to point directly to the archive [48]. In addition to OpenWayback, PyWB [49] is another replay tool, which is used by Perma [50], Webrecorder [51], and an increasing number of web archives, as the International Internet Preservation Consortium (IIPC) recommended in 2020 [52] that its members to transition to PyWB. Although the Internet Archive's Wayback Machine is proprietary software, both OpenWayback and PyWB mimic its design and functionality. Despite the multiple software implementations, the Wayback Machine model of archival replay, using WARC files as input, is the predominant model in public web archives; of the 17 web archives listed in Table 1, only archive.is does not use WARC files and Wayback Machine modalities. While the Wayback Machine model of replay has clearly been successful, researchers have begun to consider how the standardization of WARC files and the Wayback Machine model itself have shaped the field of web archiving (e.g., [53,54]). One of the key assumptions in the Wayback Machine model is that HTTP responses are stored in WARC files, and then replayed through the web archive with a mixture of client-side and server-side transformations to both recreate the past (e.g., rewrite links to point back into the web archive) and "brand" the replayed content as the past web (e.g., archival banners). Although some web archiving projects have been introduced that break with this model of Wayback Machine archival replay-such as Jawa [55], which outright eliminates JavaScript that can be detrimental to replay, and OldWeb.today [56], which emulates a browser within the actual user's web browser and then replays the web content with no transformations at all-the Wayback Machine model of partial transformation remains what users are most likely to encounter when they go to "a web archive." To illustrate how web archives transform the content of an original web page to appropriately replay it in a user's browser, we submitted the web page's URI https://maturban.github. io/playground/index.html to the Internet Archive using the "Save Page Now" feature [57]). The archive captures the root HTML file and all embedded resources included in the page. Table 2 shows the URI-Rs of the original resources comprising the web page and the corresponding URI-Ms to the mementos created by the Internet Archive (the archival datetime of each memento is marked in red). Archives also add banners [58] to provide information about both the memento being viewed and the original page (e.g., the top portion in Fig 3). Table 2. URI-Rs and URI-Ms of example web page. The URI-Rs of the original resources and the URI-Ms of their corresponding mementos for the web page https://maturban.github.io/playground/index.html.

Trusted timestamping of digital resources
Timestamping is recording the date and time of when an event occurs. A "trusted" timestamp is a timestamp initially created and verified by a trustworthy third-party service. Several tools have been developed to generate trusted timestamps. For example, OriginStamp [64] allows users to generate a trusted timestamp using blockchain-based networks on any file, plain text, or a hash value. The data is hashed in the user's browser and the resulting hash is sent to Origi-nStamp's server to be added to a list of all hashes submitted by other users. Once per day, Ori-ginStamp generates a single aggregated hash of all received hashes. This aggregated hash is converted to a Bitcoin address that will be a part of a new Bitcoin transaction. The timestamp associated with the transaction is considered a trusted timestamp. A user can verify a timestamp through OriginStamp's API or by visiting their website.
Other services, such as Chainpoint (chainpoint.org) and OpenTimestamps (opentimestamps.org), are based on the same concept of using blockchain-based networks to timestamp digital documents. Even though users of these services can pass data by value, they are not allowed to submit data by reference (i.e., passing a URI of a web page). In other words, these tools cannot be used to timestamp web pages. The only exception is a service [65] established by OriginStamp that accepts URIs from users, but the service is no longer available on its prior live web address of (https://www.isg.uni-konstanz.de/web-time-stamps/, so we cannot evaluate how well it worked.

Verifying the fixity of digital resources
In the context of digital preservation, fixity is a mechanism to demonstrate that archived resources have remained unaltered since the time they were captured [66]. The final report of the PREMIS Working Group [67] defines information used for fixity as "information used to verify whether an object has been altered in an undocumented or unauthorized way." To establish trust in repositories and web archives, different publications and standards have emphasized the importance of verifying fixity of archived resources. The Trusted Repositories Audit & Certification (TRAC) report [7] by the Task Force on Archiving of Digital Information introduces criteria for identifying trusted digital repositories. In addition to the ability to reliably provide access, preserve, and migrate digital resources, digital repositories, which include web archives, must create preservation metadata that can be used to verify that content is not tampered with or corrupted (fixity) according to Sections B2.9 and B4.4. The report recommends that preserved content is stored separately from fixity information, so it is less likely that someone is able to alter both the content and its associated fixity information [7].
As illustrated earlier in Fig 2, many web archives compute and store hashes of their archival content. The Internet Archive creates and makes available both the hash of an entire WARC file and the hash of raw mementos (more accurately, the WARC Payload Digest [45]). WARC file hashes are available via the Internet Archive's Web Crawls interface, https://archive.org/ details/web, which provides access to metadata about various web crawls stored at the Internet Archive. This metadata includes MD5 and SHA1 hashes on the individual WARC files generated by a crawl.
The hashes of raw mementos are stored and made available via the Internet Archive's CDX API [68]. However, this hashing of the WARC Payload Digest is sensitive to content-encoding https://doi.org/10.1371/journal.pone.0286879.g005 [69]. That is, if the payload stored was received as compressed (e.g., GZip or Brotli [70]), then the payload digest would be different than if it were in plain text, even if the content served to the client at replay time would be the same (as any stored content-encoding is undone at replay time). This is because different archiving tools (e.g., Heritrix, Wget, Zeno [71], and Brozzler [72]) each have their own preferences of content negotiation, which affects the content-encoding that is stored. The result is that two mementos with the same content may produce different raw memento hashes based on the tool they were archived with. Further, even the most basic change to the raw memento by replay (via URL rewriting) will lead to a different hash being computed on the client-side. This implies that comparing the original raw memento hash and the replayed memento hash would not be effective (for some media types) and other approaches should be explored.
Generating repeatable fixity information and using it to ensure that archived resources are valid will help to establish trust in web archives. Part of the problem, though, is the lack of standard techniques that users can apply to verify the fixity of replayed archived web pages [73][74][75]. Jinfang Niu [76] mentioned that none of the web archives declare the reliability of the archived content they preserve, and some archives, such as the Internet Archive and Government of Canada Web Archive, have a disclaimer [77] stating that they are not responsible for the reliability of the archived resources.
Kuhn et al. [78] define a trusty URI as a URI that contains a cryptographic hash of the content it identifies, thereby associating the name (URI) of the page with its content (hash). With the assumption that a trusty URI, once created, is linked from other resources or stored by a third party, it becomes possible to detect if the content that the trusty URI identifies has been tampered with or manipulated on the way (e.g., to prevent man-in-the-middle attacks [79]). In their second paper [80], Kuhn et al. introduce two different modules to allow creating trusty URIs on different kind of content, but this is limited to RDF graphs and byte-level content. No modules have been introduced for HTML documents or for complete web pages (i.e., computing hashes on a base HTML file and all its embedded resources).
In theory, trusty URIs should work for raw (unrewritten and unexecuted) composite mementos. As described in Kuhn et al. [78], trusty URIs could be computed for each embedded resource, and then a trusty URI would be computed for the enclosing resource (that contained references to the embedded resources). However, there are several reasons why this is an unlikely solution for web archives. First, the various web archives already have their established URI-M naming conventions, and placing the hash value of the HTTP entity in the URI-M itself would represent a significant engineering and branding change that currently seems unlikely to be made. Second, computing fixity on HTML pages would be difficult: replayed HTML changes over time, both by including the date and other replay metadata in the HTML comments (effectively making each replay unique), as well as by evolving JavaScript injected into the HTML for improved replay (e.g., JavaScript improvements that allowed cnn.com to be successfully replayed after approximately four years of failure [81]). Third, if fixity was computed on raw resources, the links to those resources would have to be rewritten, meaning they would no longer be raw. Fourth, as will be shown in Section "Archive-Level Changes", embedded resources often change on each replay, so computing fixity on a composite memento is not straightforward (i.e., even if the fixity of individual embedded resources can be verified, the set of embedded resources comprising the composite memento will change). Lastly, we would like to be be able to compute fixity on archived pages without depending on their opting into trusty URIs.
There have been several exploratory studies regarding publicly storing the hash values of archived resources in either blockchains or in other web archives. All such projects remain proofs-of-concept and have not seen widespread adoption. In the ARCHANGEL project [82,83], Collomosse et al. compute hashes on digital documents and store them in the Ethereum blockchain. The documents are not necessarily stored in web archives, and the emphasis appears to be on documents that can be expected to remain static (e.g., video files). In the WARChain project [84,85], hashes are computed server-side by the web archive and then stored in the EduPoS blockchain. The hashes are computed on the text in HTML documents, as extracted by the popular BeautifulSoup library [86]. Neither ARCHANGEL nor WARChain attempt to deal with the complexities of composite mementos. In the Synchronic Web project [87], simultaneous publishing web content and hashes of the web content in a blockchain is proposed. The use of this blockchain content has been proposed to support web archives [88], but while performance has been evaluated, to date no consideration has been made in how the playback interface transforms the content (i.e., it is suitable for the scenario depicted in Fig 1 but not the web archive scenario depicted in Fig 2). In our prior work on the Archival Fixity Server project [89], we utilize an archival fixity server that publishes manifests for individual URI-Ms and how their hashes were computed (so they can vary per resource). These manifests are then given trusty URIs (so the manifest's integrity can be verified), and then the manifests are pushed into multiple, different web archives. This allows for verification of individual URI-Ms, but does not address how a set of URI-Ms were assembled into a composite memento for a particular replay of an archived page. In other words, even if we can verify the integrity of the individual resources, there is no mechanism to verify that a particular set of archived resources in a composite memento was ever witnessed in the past (cf. temporal violations [32]).

Web archiving security issues
Few scholars have focused on security issues related to web archives and replay of mementos. To emphasize the importance of verifying the fixity of archived pages, we briefly summarize some related efforts below. For a more detailed overview and discussion, we refer to Aturban's PhD thesis [33]. Lerner et al. [90] discovered several vulnerabilities in the Internet Archive's Wayback Machine that could be leveraged to modify the rendered memento in a browser and therefore prevent generating repeatable fixity information on replayed mementos: • Archive-Escapes: Some web links are generated by JavaScript at replay time and therefore not rewritten by the archive, which leads to them being loaded from the live web. Whoever owns and controls the original server that hosts the live resource linked from the archive can inject malicious code (e.g., JavaScript) to change the client's rendered view of the archived page. Brunelle et al. [91] provided some additional examples that illustrate the effect of live resources linked from archived pages.
• Same-Origin Escapes: Malicious code in a third-party <iframe> could be ingested into a web page before it is archived. While this <iframe> cannot modify the main HTML file on the live web due to the Same-Origin Policy, the Policy becomes ineffective once the page is archived and the main HTML file as well as the <iframe> are loaded from the archive's domain for replay. Both scenarios can also be combined for malicious purposes.
• Anachronism-Injection: If a composite memento contains a resource that has never been captured by the archive, whoever has access to the original server hosting that neverarchived resource can publish a malicious version of it on the live web and submit it to the archive. The archive will capture the malicious resource and embed it in the composite memento.
The Internet Archive, in response to Lerner's suggestions, addressed some of these issues by establishing the Content-Security-Policy HTTP header [18] that notifies a user-agent (e.g., a web browser) to not load resources except from specified domains (e.g., an archive's domain). However, the problem of having live web resources linked from archived pages may still occur in the Internet Archive and other archives because there are several methods that can load live web resources into archived pages as explained by Nelson [19,23].
Cushman and Kreymer [92,93] created a shared repository [94] of seven potential threats in web archives. Some of these threats, for example, the Archive-Escapes scenario, overlap with those highlighted by Lerner et al. [90]. However, additional threats outlined by Cushman and Kreymer that may affect the ability to generate repeatable fixity information for mementos are: 1. Showing different page contents when archived: An original server can notice when a web page is being crawled by an archive. At this time, the server can reply with content that is different from the content of the page it would otherwise serve. It is also technically possible that a page is designed so that it knows when it is being replayed from an archive, and shows different content, accordingly.
2. Banner spoofing: Malicious code can be used to change the appearance of the archive's banner. This scenario is similar to Lerner et al.'s Archive-Escapes but focuses on web archives' specific banner.
Rosenthal et al. [95] described several threats against the content of digital preservation systems such as web archives. The authors argue that designers of archives must be aware of threats, such as media failure, hardware and software failure, communication errors, failure of network services, etc.
Watanabe et al. [96] described many of the same vulnerabilities and possible remedies as Lerner et al. [90] and Cushman and Kreymer [94], but provided a broader study of "web rehosting" services. Out of the 21 total rehosting services they investigated, 18 had at least one vulnerability. These rehosting services include privacy proxies (e.g., proxysite.com), language translation services (e.g., translate.google.com), and web archives, all of which have the basic structure of taking the URL of a page and passing it as an argument to a service at another URL.

Methods
The main goal of our work is to observe the playback of mementos over time and try to understand the challenges in generating repeatable fixity information (e.g., hash values) on replayed mementos. Our initial assumption was that mementos should not change over time, or in other words, we should always be able to calculate the same hash each time the same memento is downloaded. However, we found that this is not always the case. To study this further, we wanted to identify and quantify the types of changes that we observed.
We selected mementos from the 17 public web archives listed in Table 1 and downloaded each memento on 39 different days. To determine if the memento had changed over time, we used Merkle trees [97] to generate a single hash value for each downloaded memento and used these hashes to identify when there were differences for the same memento. Fig 6 shows an overview of our general process. We will briefly discuss each of the steps 1-3 here (step 4 will be discussed in the next section). Additional details on our methodology and results are available in Aturban's PhD thesis [33].
Each memento we evaluate is actually a composite memento, made up of the base HTML page and all embedded resources used to render the page. We compute hash values on all of the elements of the composite memento, including a selection of the HTTP response headers (those that should not change) and the URI-Ms of the resources themselves. To set a baseline, we compute hashes on the content of raw memento resources (without archive modification). To account for additional resources loaded at playback time (e.g., those loaded by JavaScript), we also compute hashes on the replayed content. All of these hashes are combined into a single hash for the composite memento using Merkle trees, as will be described later. This hash can be viewed as a value that summarizes the memento's content at a particular point in time.
When there are differences in consecutive hashes, the use of Merkle trees will allow us to analyze the individual components to help determine where the change occurred.

Step 1: Collect a dataset of mementos
In November 2017, we collected a dataset of 16,627 mementos of 3,698 unique URI-Rs (original resources) from the 17 public web archives shown in Table 1. Our technical report [98] describes in detail the methods we used to create this dataset. We provide a summary of the process here.
We wanted to build a dataset of mementos that spanned a wide range of web page types and web archives, including those that used different playback software. We set some initial collection targets for URI-Rs: • at least 200 URI-Rs per archive • include some well-known web pages • include web pages at multiple path lengths (i.e., some top-level, some deep links) • include web pages likely to contain embedded resources, such as images, CSS, JavaScript Table 3 shows the final numbers of selected URI-Rs and URI-Ms per archive, and  Table 4 provides the full distribution of URI-Ms over archive and time. We were able to collect at least some mementos from each year between 1996-2017. Fig 8 shows the median number of resources (e.g., images, CSS/JavaScript files, and iframes) comprising composite mementos in our dataset per year. As expected, the figure indicates that web pages contain fewer resources in early years 1996-2006 as compared to recent years. Overall, there was a mean of 42 embedded resources per page, with a median of 32 resources per page.

Step 2: Download rewritten/raw composite mementos
We downloaded the 16,627 mementos 39 times during the 422 days between November 16, 2017 and January 11, 2019. All downloaded content was stored in WARC files, with a total size of 1.46 TB. Full details of how we generated the WARC files are available in Chapter 6 of Aturban's PhD thesis [33].
We used Squidwarc [99,100] to download mementos. Squidwarc uses Google Chrome in headless mode (Headless Chrome) to render mementos. Headless browsing allows loading an entire web page faster as it runs without the need of the UI. Another powerful feature of Headless Chrome is its ability to execute JavaScript, which leads to the discovering of URIs to embedded resources that otherwise would not be discovered by tools that do not execute Java-Script, such as Wget [101].
Each time we downloaded a memento, we created two files, rewritten.warc and raw.warc. We replayed the memento and stored all of the resources needed to appropriately replay a composite memento, including those loaded by JavaScript, in rewritten.warc. To allow for content comparison between downloads, we downloaded the raw memento for each discovered resource and stored these in raw.warc. This is because the content of rewritten resources will often be different each time they are replayed, due to the archive modifications discussed earlier. So we do not want to use these for comparing mementos because we expect the contents to change upon each replay.
Our intuition is that we should be able to generate repeatable hashes on raw mementos, but we have found instances where this is not the case. Some archives respond to raw memento requests with an altered (or a custom) HTML base file, not the raw content. In other cases, raw mementos might be modified for security reasons (e.g., applying Email Address Obfuscation [102]) by a third-party service used by archives. So while using the raw mementos is frequently useful for establishing a baseline for comparison, there are cases where the archive will not or can not return the page as originally archived, including webcitation.org, collectionscanada.gc.ca, and archive.is. These cases will be explored further when we detail the types of changes we discovered in our study. Step

3: Generate aggregated hash values
Our goal is to generate a single aggregated hash value (i.e., root hash) for each downloaded composite memento. Instead of creating hashes of the entire rewritten.warc and raw.warc files as a whole, we decided to break the process down to smaller components so that when changes in the hash value occurred, it would be easier to identify the changed component.
Using hashing, we wanted to be able to track the following components of a composite memento: • S: The set of all resources (the base HTML file and embedded resources) that comprise a composite memento.
• C: The returned HTTP status code to a memento request • H: The set of HTTP response headers that we do not expect to change, namely Memento-Datetime, Content-Type, Location, and all original response headers that start with X-Archive-orig-.
• R: The returned HTTP entity body of a memento.   109 107 108 105 109 107 106 109 107 107 109  archive.bibalex.org 199 -  The attribute S is associated with each composite memento while the attributes C, H, URI-M, and R are associated with each individual memento. Ideally, when replaying a memento at different times, each attribute defined above would always have the same value. For individual resources and simple composite mementos, this can often be true. However, for pages with JavaScript and/or large |S|, this is often not true.
For each resource contained in S, we create a hash on the HTTP entity body R, a hash on the URI-M, and a hash on selected HTTP response headers H that are not expected to change. We include the URI-M to identify if a resource in a composite memento has redirected to a different memento. We include the Memento-Datetime HTTP header to identify if the datetime of an embedded resource has changed (could occur if the original memento was unavailable at playback time), Location HTTP header to identify if mementos were served from different locations, the Content-Type HTTP header to identify if the format of the resource changed, and original HTTP headers that begin with X-Archive-orig-. In particular, there are certain HTTP headers that we do not want to include, such as Date because its value changes each time the memento is accessed.
To create the root hash for each downloaded composite memento, we use rewritten.warc and raw.warc to build Merkle Trees [97]. A leaf node of a Merkle tree contains the hash of data, while a non-leaf node contains a hash of its children nodes' hashes. As Fig 9 and Algorithm 1 show, our technique of generating the root hash of a memento is based on four Merkle trees (marked in different colors), where the output of one Merkle tree becomes input to another Merkle tree. There are multiple ways to generate a single aggregated hash, but we chose to use the Merkle tree because its design is well-suited for generating a hash of hashes  For each resource in rewritten.warc and raw.warc, a Merkle tree (marked in brown in Fig 9) is built on the HTTP response headers of a resource. For instance, the values Hash 5 and Hash 8 are generated on the HTTP response headers of Resource A and Resource B , respectively.
Next, another Merkle tree (marked in blue in Fig 9) is used to calculate the hash of each resource. The input to this Merkle tree includes (1) the resulting hash value for the HTTP response headers generated from the previous step (e.g., Hash 5 ), (2) the hash of the HTTP entity body of the resource (e.g., Hash 6 ), and (3) the hash of the resource's URI-M (e.g., Hash 7 ) After this step, we should have a single hash for each resource in rewritten.warc and raw.warc (e.g., Hash 3 of Resource A and Hash 4 of Resource B in Fig 9). The next step is to create a Merkle tree (marked in red) that consists of all resources' hashes in each WARC file. This step will result in only two hashes: one hash for rewritten.warc (e.g, Hash 1 ) and the other hash for raw.warc (e.g, Hash 2 ).
The final step is to calculate the final hash (i.e., Root Hash) using a Merkle tree (marked in green in Fig 9) where the input of this tree is the hash of rewritten.warc and the hash of raw. warc. The resulting hash can be considered as a summary of the content of a memento at a particular time.
The Merkle trees hold all possible combinations that can be involved in hash calculation, but not all of these combinations will be included in the hash calculations, because we are not always able to obtain the same information from all archives (due to differences in how mementos are replayed). For archives that provide raw mementos, everything from raw.warc will be included in the hash calculation, along with certain elements from rewritten.warc, including the hash of the HTTP status code, hash of the URI-M itself, hashes of selected HTTP headers, and hashes on entity bodies that are not text-based (e.g., images, PDF). We do not include text-based entity-bodies (e.g., HTML, CSS, JavaScript) from rewritten.warc because archives often inject code for archive banners, rewrite URLs, and make other modifications upon each replay (e.g., the replay date of playback shown in Fig 4). As a result, when we claim that an entity has changed, that determination was based on, for example, raw HTML or both replayed and raw JPEG, but never replayed HTML, since replayed HTML is expected to change. We also exclude elements from any live resources (i.e., non-mementos) found in rewritten.warc, because these are usually archive banners and other archive metadata that we expect to change on each replay. (Unexpected live resources are a point of concern, but harken back to the security issues from Lerner et al. [90] discussed in Related Work).

Identifying types of changes on the playback of mementos
For each memento, we have 39 root hash values that were generated after each replay. We would expect to see the same content, and thus the same hash values, each time. We compared consecutive hashes: the first hash is compared with the second hash, the second hash is compared with the third hash, and so on. Each time two consecutive hash values were different, we identified one type of change causing different hashes for the same memento. In general, the change could occur on the base HTML file, embedded resources (e.g., images), or HTTP response headers.
As defined in the previous section, we use the notation S to represent the set of all resources in a composite memento, and for each resource in S, C represents the HTTP status code, H represents the selected set of HTTP response headers, and R represents the HTTP entity body of the resource. embedded resource that is loaded is not determined until playback and can change each time the page is reloaded.
Although it falls outside the temporal range of our study (2017-2019), we know of another high-profile archive replay system upgrade that would impact S. From November 1, 2016 through sometime in Spring 2022, the Internet Archive's Wayback Machine was unable to fully replay the top level cnn.com page archived after November 1, 2016 due to cross-origin assumptions that were invalidated by CNN's JavaScript running at archive.org and not cnn.com [81]. When the Internet Archive upgraded their replay software sometime in Spring 2020, the replay resumed correctly. The embedded resources for the affected cnn.com pages were archived correctly, but replay failures meant those resources would not be included in S for a composite memento generated between late 2016 through early 2020. After the Internet Archive's upgrade, any new replay of those affected pages would see an avalanche of new resources in S, which although is a more faithful replay of the archived page, would result in invalidation of any hashes computed on the composite memento.

ΔC = (S, C', H, URI-M, R):
The HTTP status code of one or more resources comprising a composite memento has changed.
One of the reasons for changes in the HTTP status code of a resource is the technique web archives use to crawl or capture web pages. It is not uncommon to encounter a situation where the embedded resources within a composite memento have different values for Memento-Datetime. This is because archives may start serving mementos even before all of their embedded resources have been crawled. Recall that archive web crawlers place all discovered but not yet crawled URIs in a queue so that they can be processed later. Thus, it is likely that we see some archived resources with 404 Not Found at a particular time that then become 200 OK when revisited at a later time. Found. This server upgrade is opaque to interactive users (browsers silently handle all HTTP redirects), but could impact hashing technique that relied on C.
It is also possible that different archival redirects for the same HTTP request eventually return different HTTP status codes.  In general, we identify two URI-Ms as "different" if either their URI-Rs or Memento-Datetimes are different. However, there are other cases where two lexicographically different URI-Ms canonicalize to the same value. For example, we requested the same URI-M http://perma-archives.org/warc/20170303200708/http://id.loc.gov/ at two different times on March 27, 2018 and July 08, 2018. The only difference between the two responses is that the archive perma.cc started serving over HTTPS (instead of HTTP) on or around July 08, 2018. In such cases, without canonicalizing URI-Ms (e.g., HTTP = HTTPS), they will produce different hashes.
Archives may serve requested mementos in iframes. For example, webarchive.org.uk began supporting the option mp_ for loading the archived content into an iframe. Therefore, any new requests for mementos from this archive will result in 302 redirect to a URI-M that has mp_ after the timestamp.
The URI-M change may also be referred to as TimeMap change, where archives respond, at different times, with various TimeMaps of the same URI-R. The TimeMap inconsistency occurs for reasons including deduplication and redaction techniques of mementos, archival restructuring, and transient errors [104]. The TimeMap change causes requests of the same memento to redirect to different URI-Ms.

ΔR = (S, C, H, URI-M, R'):
The returned HTTP entity body of one or more resources comprising a composite memento has changed.
We saw an example of the Representation change when we requested the raw content of the same memento multiple times from perma.cc. We were expecting to always be presented with same raw content, but we noticed a different HTTP entity returned each time. The actual change in the returned content was not caused by the archive, but by Cloudflare, a third-party service used by the archive. This service modifies the HTML content being returned to the client by applying Email Address Obfuscation [102] to hide any email address included in the content and help to prevent spam. Fig 14 shows another example of HTTP entity change. Recall that archive.is does not serve raw mementos, but provides a ZIP file with the rewritten content (though without banners or other headers that would change on each replay). At replay time, this archive inconsistently Such behavior might be applied by the archive to prevent redirects to the live web, but the rewritten content affects the hash calculation, resulting in different hash values.
• webharvest.gov: Fig 16 (webharvest.gov) shows why vefsafn.is inserts a custom HTML page in place of the redirect: by strictly not modifying any of the HTTP response including the Location HTTP response header, replaying the original redirect leads the client out of the web archive and to the live web, even if the target resource is in fact archived.
• archive.org: As shown in Fig 16 (archive.org), the Internet Archive correctly replays the raw entity (marked in blue), while rewriting the Location HTTP response header. This keeps the user in the web archive while returning the original entity.
The custom HTML page returned by vefsafn.is might prevent generating repeatable hashes because the returned HTML page has content that is expected to change (e.g., the https://doi.org/10.1371/journal.pone.0286879.g015 banner) as the web archive software is updated. Because we do not consider any resources retrieved from the live web in hash calculation, the 302 Redirects to the live web by webharvest.gov do not affect generating repeatable hashes. We exclude any redirects to the live web in hash generation because live web resources are not expected to remain unchanged. By that same line of thought, it would be possible to implement a stricter policy that considers any redirects to the live web as a change. The technique that archive.org uses to respond to raw requests does not affect hash calculation because the archive returns the raw HTML entity and only rewrites the Location HTTP response header.

ΔURI-M, ΔR = (S, C, H, URI-M', R'):
The URI-M and Representation change occurs when one or more resources in the set comprising a composite memento has both a lexicographically different URI-M as well as change in the HTTP entity body. This commonly happens when the web archive redirects a request to a resource with a different Memento-Datetime, thus resulting in a different URI-M, and that resource has changed relative to the prior request. For example, Fig 17 shows Fig 19 look the same, but we were able to identify differences between the two images using the image comparison tool Resemble [105].

Timeout/Not resolved
The Timeout/Not resolved change occurs when one or more HTTP requests of resources in the set comprising a composite memento has a connection timeout error. In general, this  type of change refers to a situation where there is no HTTP response returned from the server, not even an HTTP status code.

Almost 90% of the mementos have at least two different hashes
We calculated the hash values for 39 downloads of each of our 16,627 mementos. Table 5 shows the number of mementos per archive that have at least two different hashes, always produced the same hash, and always produced a different hash. Almost 90% of the mementos (14,707, 88.45%, or approximately seven out of eight) have at least two different hashes, https://doi.org/10.1371/journal.pone.0286879.g019 including all mementos from seven of the archives. This is even after our efforts to ensure that URL rewriting and archive-injected resources would not influence the computed hash. Fig 20 shows the number of distinct hash values per memento over all archives. The blue bar on the left indicates when there is only a single hash value (same as "always produced the same hash" in Table 5). Our intuition was this would be the most common case, but this occurs for 11.55% (1,920) of our mementos. Of the 1,920 mementos that always produced the same hash, the highest number (587) was from webcitation.org (Table 5), which archives little to no JavaScript, resulting in poor replay of archived web pages but an increased chance of a persistent hash value. Even more alarming is the red bar on the right, which indicates 39 different hash values, meaning a different hash value was computed every time the memento was replayed (same as "always produced a different hash" in Table 5). This occurred for over 16% (2,670) of our mementos. In other words, the conventional hashing approach to determine fixity in digital preservation settings works properly only for about one out of every eight mementos in our sample. Fig 21 illustrates how the pool of mementos that had at least two distinct hashes increased over time. About 40% of the mementos produced different hashes after download 2 on November 18, 2017. Then, after 37 more downloads (within 420 days), the cumulative percentage had increased to 88.45% by January 11, 2019. This illustrates our observation that the chance of getting different hashes for the same memento increases over time. We will examine some reasons for these hash differences in the next section.

Quantifying the types of changes
To understand the types of changes that might cause different hash values for the same memento, we compared consecutive hashes for each memento. Each time any two consecutive hash values are different, we identify at most one type of change causing the different hashes, even though multiple categories might apply. We look for changes in the following order: Set, Status, URI-M or Headers (neither affects HTTP entity body), Representation, and URI-M and Representation. For example, if we detected that the set of resources comprising a memento at time t 0 varies from the set at t 1 , then this change is marked as Set and other categories of changes are not considered, because if sets are different, it implies that hashes will be also different. Similarly, if sets are identical, but there are some differences in the resources' HTTP status codes, then we assign the type of change Status, and no other types are considered. Fig 22 shows the types of changes affecting all mementos in each download. On average only about one-third of the mementos have changes when comparing consecutive downloads. Download 11, which was the first download in 2018, has the most mementos with changes, 7,557 (about 45% of the mementos). We can see that the Set change is the most prevalent, but recall that if we detect this change, we stop looking for other types of changes. A Set change means that we may have observed new resources, a resource was replaced with another, or a previously seen resource has gone missing. After Set, the next most frequent types of changes are Representation, Timeout, and Status.
To drill-down a bit, we show the percentage of mementos with each type of change in each download for each archive in Fig 23. As expected, a large percentage of mementos from each archive are producing different hashes because of the Set change. This is mainly caused by dynamic resources generated after executing JavaScript, such as the three different images loaded by JavaScript on the www.fws.gov memento shown in Fig 10. The Representation change is detected mainly in just a few archives. We mentioned earlier that for archive.is we use a rewritten version to create the hash since the original raw source is not available and that in rewritten links in index.html, the archive often refers to itself with different domains. For instance, in download 11 archive.is used archive.li, in download 12 it used archive.is, and in download 14 it used archive.today. This caused the comparisons of downloads 10 vs. 11, 11 vs. 12, and 13 vs. 14 to show the Representation change because of the different rewritten hostnames for archived links. The archives vefsafn.is, webcitation.org, and perma-archives.org are also tagged with the Representation change. For vefsafn.is it is because the archive returns a rewritten page with HTTP 200 for requests for raw mementos, as mentioned earlier. The webcitation.org archive does not provide raw content, so our hash calculations are made on the rewritten content, which results in Representation changes in every download. The perma-archives.org archive uses a third-party service (Cloudflare) to prevent spam, which modifies the content being returned to the user to obfuscate email addresses. Thus, every memento that contains an email address in the HTML will have different content on each replay. For example, the mementos of the NLI collection in europarchive.org became unreachable in download 21. We discovered the new location starting from download 24 (i.e., internetmemory.org presented in the next row) [106]. Then, these NLI mementos in internetmemory.org became inaccessible, returning 403 Error, at download 31. We manually discovered the new location archive-it.org of these migrated mementos at download 39. As another example, we noticed that the performance of the webarchive.org.uk archive before download 20 is totally different from the behavior after download 20. From the WARC files, we found that the archive had upgraded the replay service to a new version of PyWB. (We discovered the change because the archive began supporting the mp_ option for loading the archived content into an iframe.)

Archive-level changes
In general, we should be cautious in interpreting the heatmap because multiple consecutive cells with similar colors do not always imply good stable behavior. For example, there are some periods where all mementos from particular archives were not available, like mementos from perma-archives.org in downloads 25, 26, and 27 (i.e., returning 500 Internal Server Error between July 31, 2018 and August 19, 2018), so the white cells indicate no changes in mementos, but they were actually unavailable during

PLOS ONE
those three downloads. We added annotations to the heatmap to explain some of the major archive-level changes.

Changes on each download
To further investigate the amount of change seen on each download, we constructed an image indicating the unique hash values calculated each time we download the set of mementos from an archive. Fig 25 is an example showing the hash values calculated for each resource over all 1,566 URI-Ms from the Internet Archive for download 1. Each unique hash value is represented by a square in the figure. The squares are stacked such that the square in the bottom left corner represents the hash of the first resource that was downloaded. In download 1, 40,500 unique hashes were calculated. The figure indicates that less than 50% of all the unique hashes that will be calculated over the 39 downloads were seen in download 1. Fig 26 shows snapshots of an animation of this figure over all 39 downloads. The gray squares indicate that a hash that had previously been calculated was not seen in the current download (i.e., a change had occurred). We also include animated GIFs in S2-S6 Figs for hashes from archive.is, webarchive.loc.gov, webarchive.proni.gov, webcitation.org, and webarchive.org.uk, respectively. Animated GIFs for all archives in this study are available at https://github.com/oduwsdl/mementos-fixity/tree/master/hashing_techniques/. The most lenient policy for detecting changes would be to look only at the resource's HTTP entity body and use that as the basis for calculating the hash values. Fig 28 shows

Discussion
Our 14-month study of the replay of over 16,000 mementos has shown that almost 90% of the mementos produced at least two different hash values, which we presume to be contrary to the expectations of non-expert users of web archives. This is despite our efforts to prevent archiveinjected and other live web resources from affecting the computed hash. Our approach to computing hash values for mementos in this work took into account our preliminary study [107] on how to carefully calculate fixity information for composite mementos: Even with these suggested guidelines, there are additional considerations that should be taken into account. First, we have demonstrated that many archived web pages are in flux, meaning that the exact resources loaded by a composite memento at replay time may depend upon the availability of the archive's holdings, which can be subject to available computation and storage resources, as well as accession and policy changes. Related to this, fixity information computed for mementos that experience transient errors on replay should not be used for comparison. Further, because of the effect of JavaScript and changes in TimeMaps, there may be more than one correct aggregated hash value calculated on the playback of a composite memento over time. For example, in the composite memento shown in Figs 10 and 11, a set of three hash values could cover all possible states. These sets of hashes could be stored in other web archives [89], or in a blockchain [82,84] or other distributed ledgers. Client-side archiving tools (e.g., ArchiveWeb.page [108]) could be used to create a WACZ file by recording multiple reloads of a web page to ensure inclusion of all of the resources needed by multiple replay attempts of the composite memento in playback tools (e.g., ReplayWeb.page [109]). However, this potential solution would not work for composite mementos that produce different hashes on every replay, such as those indicated by the red, rightmost column in Fig 20. Unfortunately, a number of challenges prevent the application of conventional hashing techniques, such as blockchain. As we have shown, the archived pages are frequently different on each replay, so it is not always possible to agree on what the inputs for hashing functions should be; there is not even a clear baseline (e.g., first replay) with which we can compare. Large-scale auditing of general web pages will be difficult to automate because of JavaScript embedding different resources, the web archives' holdings changing (either through patching the archive or regular ingest and deaccession), new releases of replay engines and related infrastructure (web servers, caches, and proxies), as well as transient errors in the network transport. Unfortunately, at scale, the engineering realities of constant churn of changed and different resources on each replay can be impossible to distinguish from malicious manipulation of the archives' holdings themselves.
Since conventional hashing techniques cannot be used to verify the fixity on the playback of mementos, we should consider new approaches, or archive-aware hashing, for generating repeatable fixity information on the playback of archived web pages. This would need to take into consideration the fact that it is likely impossible for an archive replay system both to allow JavaScript execution and to have deterministic replay of archived composite mementos. The findings presented in this paper indicate that any future system should consider our suggestions, however they are based largely on the particular mementos that we observed. We do not have an exhaustive exploration of all of the different types of mementos that might be available, so more criteria may be required.
One potential approach for verifying fixity on composite mementos could include clientside replay systems, such as InterPlanetary Wayback (IPWB) [110,111]. IPWB is a web archive replay system based on InterPlanetary File System (IPFS) [112] that promises replay of the original raw mementos with minimal to zero server-side rewriting. The content-addressable nature of the peer-to-peer file system, IPFS, ensures the fixity of the original content. However, some rewriting is inevitable to prevent live-leakage and for faithful replay of archived web, as discussed earlier. IPWB accomplishes this by using a Service Worker [113] module called Reconstructive [114]. This module resides in the web browser and intercepts all the requests in its scope to make necessary changes in the raw memento response received from the archive server. Effectively, it shifts the responsibility of rewriting from the server to the client. The Reconstructive module also comes with an unobtrusive banner [58], a custom HTML element [115], which is injected in the navigational HTML pages by the Service Worker. Such a system has the potential to integrate client-side fixity verification component in the Service Worker module before any rewriting is performed and corresponding report inclusion in the banner. While this system has the potential to integrate a fixity component, it does not eliminate all of the complexities involved in fixity verification of composite mementos.
Even with the difficulty in computing fixity information for replayed composite mementos, we learned a great deal about the 17 public web archives in this study, not the least of which was tracking the movement of four of the 17 web archives [106]. This suggests that there is utility in periodic monitoring of the playback of the same mementos from multiple archives to globally detect such systemic changes, a sort of web archive observatory service. In addition, there is standards work to be done to create a machine-readable method to describe the provenance of archived web pages as they move from one archive to another.

Conclusions
This study set out to address the following questions: • What are the types of changes in the playback of composite mementos that prevent or affect generating repeatable hashes?
• How many times does each identified type of change occur?
• For each composite memento, will excluding certain embedded resources that we expect to change over time help to generate repeatable hash values? Would, for example, excluding resources added by the archive produce more consistent hash values?
Conventional hashing techniques can be applied to server-side files: zip, tar, WARC, or WACZ. However, general purpose web archives like the Internet Archive do not allow for download of WARC files, and likely never will. Most archives only permit user access via a replay engine, such as the Internet Archive's Wayback Machine or PyWB. To enable third party auditing of a web archive, including potentially uncooperative web archives, we must consider pages as rendered through the replay engine.
We conducted a 14-month study on the replay of 16,627 mementos from 17 public web archives. By replaying the same mementos over time and computing Merkle tree based hash values, we were able to identify seven major types of changes that could cause the same memento to produce different hash values. We found changes in the Set of resources comprising a composite memento, the Status code of one or more resources, HTTP Headers that were not expected to change, the URI-M of one or more resources due to redirection, the Representation of the HTTP entity body in one or more resources, both the URI-M and Representation of one or more resources when redirection led to a different entity body, and experienced Timeout when one or more resources encountered a connection timeout error.
In our study, we attempted to ensure that resources added or modified by archives to facilitate replay were not included in the calculation of fixity information. Even with these considerations, it was not possible to generate stable fixity information on the replay of composite mementos over time. Almost 90% of our studied mementos produced at least two different hash values over the study period, and over 16% of the mementos produced a distinct hash value on each of the 39 replays.
During our study, we also found that four of the archives moved to new hosting locations and archival services during the time period under observation. Other archives upgraded their replay systems, resulting in the inability to accurately compare the fixity of mementos generated before and after the change.
The lessons learned in this study could be used to produce a more archive-aware hashing function that takes into account changes caused by JavaScript and patterns in computed hash values. We also suggested that a web archive observatory could be developed using such techniques to monitor the state of public web archives in a global scale.